Portable Lexical Analysis for Parsing of Morphologically-Rich Languages

نویسندگان

  • Marek Medved
  • Milos Jakubícek
چکیده

In this paper we present new approach to lexical analysis in the Synt parser. We describe three fast lexical analyzers we have exploited for lexical analysis and advantages of the re2c fast lexical analyzer in comparison to others. This paper shows a new lexical analysis workflow which is both easy to maintain and portable to new languages. Finally we provide an evaluation of the new lexical analysis against the original lexical analysis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Special Techniques for Constituent Parsing of Morphologically Rich Languages

We introduce three techniques for improving constituent parsing for morphologically rich languages. We propose a novel approach to automatically find an optimal preterminal set by clustering morphological feature values and we conduct experiments with enhanced lexical models and feature engineering for rerankers. These techniques are specially designed for morphologically rich languages (but th...

متن کامل

Knowledge Sources for Constituent Parsing of German, a Morphologically Rich and Less-Configurational Language

We study constituent parsing of German, a morphologically rich and less-configurational language. We use a probabilistic context-free grammar treebank grammar that has been adapted to the morphologically rich properties of German by markovization and special features added to its productions. We evaluate the impact of adding lexical knowledge. Then we examine both monolingual and bilingual appr...

متن کامل

Joint Morphological and Syntactic Analysis for Richly Inflected Languages

Joint morphological and syntactic analysis has been proposed as a way of improving parsing accuracy for richly inflected languages. Starting from a transition-based model for joint part-of-speech tagging and dependency parsing, we explore different ways of integrating morphological features into the model. We also investigate the use of rule-based morphological analyzers to provide hard or soft...

متن کامل

An LR-inspired generalized lexicalized phrase structure parser

The paper introduces an LR-based algorithm for efficient phrase structure parsing of morphologically rich languages. The algorithm generalizes lexicalized parsing (Collins, 2003) by allowing a structured representation of the lexical items. Together with a discriminative weighting component (Collins, 2002), we show that this representation allows us to achieve state of the art accurracy results...

متن کامل

تأثیر ساخت‌واژه‌ها در تجزیه وابستگی زبان فارسی

Data-driven systems can be adapted to different languages and domains easily. Using this trend in dependency parsing was lead to introduce data-driven approaches. Existence of appreciate corpora that contain sentences and theirs associated dependency trees are the only pre-requirement in data-driven approaches. Despite obtaining high accurate results for dependency parsing task in English langu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013